GC3Pie: A Python framework for high-throughput computing
نویسندگان
چکیده
This paper present GC3Pie [7], a python library to ease the development of scalable and robust High Throughput data analysis tools. Most of the current distributed computing middlewares as well as most of the in-house grown scripts fall short in reaching the scaling and reliability factors required by the ever growing demand of large data analysis. GC3Pie provides mechanisms to automitise the execution and the monitoring of large collection of applications while, at the same time, provides simple data structures and interfaces to steer the behaviour of the underlying system in an application-centric perspective. The goal of GC3Pie is to embody the common execution and monitorig processing part of large data analysys while moving most decision making logic to the application level; like, for example, the reaction of certain types of failures, the validation of the application execution or the brokering of the computing resources driven by application fidelity metrics. This allows to write application specific tools that take full control of the underlying computing and data infrastructure, as opposite of current middleware stacks that are trying to embody the full control of the execution logic thus reducing the flexibility of the entire system as they prevent applications to define their own expected behaviour of the system.
منابع مشابه
A Grid execution model for Computational Chemistry Applications using the GC3Pie framework and AppPot
Porting and running of computational chemistry applications on distributed systems have been performed for a set of quantum mechanics scattering programs within a collaboration between the Grid Computing Competence Centre (Zurich), the Computational Dynamics and Kinetics Group (Perugia) and the Italian Grid Infrastructure (Bologna). For this purpose the high throughput execution framework GC3Pi...
متن کاملHigh-Content Digital Microscopy with Python
High-Content Digital Microscopy enhances user comfort, data storage and analysis throughput, paving the way to new researches and medical diagnostics. A digital microscopy platform aims at capturing an image of a cover slip, at storing information on a file server and a database, at visualising the image and analysing its content. We will discuss how the Python ecosystem can provide such softwa...
متن کاملHTSeq—a Python framework to work with high-throughput sequencing data
MOTIVATION A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. RESULTS We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to r...
متن کاملisomiRID: a framework to identify microRNA isoforms
SUMMARY MicroRNAs (miRNAs) have been extensively studied owing to their important regulatory roles in genic expression. An increasingly number of reports are performing extensive data mining in small RNA sequencing libraries to detect miRNAs isoforms and also 5' and 3' post-transcriptional nucleotide additions, as well as edited miRNAs sequences. A ready to use pipeline, isomiRID, was developed...
متن کاملEfficient visualization of high-throughput targeted proteomics experiments: TAPIR
MOTIVATION Targeted mass spectrometry comprises a set of powerful methods to obtain accurate and consistent protein quantification in complex samples. To fully exploit these techniques, a cross-platform and open-source software stack based on standardized data exchange formats is required. RESULTS We present TAPIR, a fast and efficient Python visualization software for chromatograms and peaks...
متن کامل